Neural Networks
○ Elsevier BV
All preprints, ranked by how well they match Neural Networks's content profile, based on 32 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.
Doty, B.; Mihalas, S.; Arkhipov, A.; Piet, A.
Show abstract
Deep convolutional neural networks (CNNs) are powerful computational tools for a large variety of tasks (Goodfellow, 2016). Their architecture, composed of layers of repeated identical neural units, draws inspiration from visual neuroscience. However, biological circuits contain a myriad of additional details and complexity not translated to CNNs, including diverse neural cell types (Tasic, 2018). Many possible roles for neural cell types have been proposed, including: learning, stabilizing excitation and inhibition, and diverse normalization (Marblestone, 2016; Gouwens, 2019). Here we investigate whether neural cell types, instantiated as diverse activation functions in CNNs, can assist in the feed-forward computational abilities of neural circuits. Our heterogeneous cell type networks mix multiple activation functions within each activation layer. We assess the value of mixed activation functions by comparing image classification performance to that of homogeneous control networks with only one activation function per network. We observe that mixing activation functions can improve the image classification abilities of CNNs. Importantly, we find larger improvements when the activation functions are more diverse, and in more constrained networks. Our results suggest a feed-forward computational role for diverse cell types in biological circuits. Additionally, our results open new avenues for the development of more powerful CNNs.
Zhao, Q.; Xu, J.; Li, D.; Wu, X.; Zhang, K.; Chu, C.; Fan, L.
Show abstract
NeuroAI develops the interplay of neuroscience and artificial intelligence, especially on visual processing. Human visual system organizes objects based on a representational hierarchy. However, it remains unclear whether this hierarchy arises from visual or semantic information. One hypothesis posits that the visual system is structured around statistical regularities of visual information. Here, we test this hypothesis using the THINGS datasets and pure-visual deep neural networks (DNN). We constructed a low-dimensional object space based on multiple abstract object properties, reflecting statistical patterns of visual regularities. By applying voxelwise encoding models, we identified clusters in the higher visual cortex based on their property tuning, and they were found to support specific object categories. These clusters serve as the middle level to reveal a property-cluster-object hierarchical organization. Subsequently, we investigated whether this hierarchical structure could be captured by a self-supervised DNN. Through activity similarity analysis, we mapped the brain clusters onto the DNN and independently found that the DNNs clusters exhibited distinct property tuning and influenced the classification accuracy of corresponding object categories, mirroring the effects observed in the human brain. Our results demonstrate similar hierarchical structures in the human brain and self-supervised DNN, suggesting that the visual regularities shape neural architecture of visual system. This study highlights the great potential of neural computational model in neuroscience study. Index Terms Visual Processing, Abstract Property, Hierarchical Representation, Self-supervised Visual DNN
Qi, F.; Wu, W.
Show abstract
How animal neural system addresses the object identity-preserving recognition problem is largely unknown. Artificial neural network such as convolution network (CNN) has reached human level performance in recognition tasks, however, animal neural system does not support such kernel scanning operation across retinal neurons, and thus the neuronal responses do not match that of CNN units. Here, we used an alternative recognition-reconstruction network (RRN) architecture as an analogy to animal-like system, and the resulting neural characteristics agreed fairly well with electrophysiological measurements in monkey studies. First, in network development study, the RRN also experienced critical developmental stages characterized by specificities in neuronal types, connectivity strength and firing pattern, from early stage of coarse salience map recognition to mature stage of fine structure recognition. In digit recognition study, we witnessed that the RRN could maintain object invariance representation under various viewing conditions by coordinated adjustment of responses of population neurons. And such concerted population responses contained untangled object identity and properties information that could be accurately extracted via a simple weighted summation decoder. In the learning and forgetting study, novel structure recognition was implemented by adjusting entire synapses in low magnitude while pattern specificities of original synaptic connectivity were preserved, which guaranteed a learning process without disrupting the existing functionalities. This work benefits the understanding of human neural mechanism and the development of humane-like intelligence.
Khan, S.; Wong, A.; Tripp, B.
Show abstract
Under difficult viewing conditions, the brains visual system uses a variety of recurrent modulatory mechanisms to augment feed-forward processing. One resulting phenomenon is contour integration, which occurs in the primary visual (V1) cortex and strengthens neural responses to edges if they belong to a larger smooth contour. Computational models have contributed to an understanding of the circuit mechanisms of contour integration, but less is known about its role in visual perception. To address this gap, we embedded a biologically grounded model of contour integration in a task-driven artificial neural network, and trained it using a gradient-descent variant. We used this model to explore how brain-like contour integration may be optimized for high-level visual objectives as well as its potential roles in perception. When the model was trained to detect contours in a background of random edges, a task commonly used to examine contour integration in the brain, it closely mirrored the brain in terms of behavior, neural responses, and lateral connection patterns. When trained on natural images, the model enhanced weaker contours and distinguished whether two points lay on the same vs. different contours. The model learnt robust features that generalized well to out-of-training-distribution stimuli. Surprisingly, and in contrast with the synthetic task, a parameter-matched control network without recurrence performed the same or better than the model on the natural-image tasks. Thus a contour integration mechanism is not essential to perform these more naturalistic contour-related tasks. Finally, the best performance in all tasks was achieved by a modified contour integration model that did not distinguish between excitatory and inhibitory neurons. Author summaryDeep networks are machine-learning systems that consist of interconnected neuron-like elements. More than other kinds of artificial system, they rival human information processing in a variety of tasks. These structural and functional parallels have raised interest in using deep networks as simplified models of the brain, to better understand of brain function. For example, incorporating additional biological phenomena into deep networks may help to clarify how they affect brain function. In this direction, we adapted a deep network to incorporate a model of visual contour integration, a process in the brain that makes contours appear more visually prominent. We found that suitable training led this model to behave much like the corresponding brain circuits. We then investigated potential roles of the contour integration mechanism in processing of natural images, an important question that has been difficult to answer. The results were not straightforward. For example, the contour integration mechanism actually impaired the networks ability to tell whether two points lay on the same contour or not, but improved the networks ability to generalize this skill to a different group of images. Overall, this approach has raised more sophisticated questions about the role of contour integration in natural vision.
Trpevski, D.
Show abstract
Synaptic plasticity has been shown to occur when calcium, flowing into the synapse due to incoming stimuli, surpasses a threshold level. This threshold level is modifiable through a process called metaplasticity. Some neurons, such as the striatal projection neurons, use different sources of calcium as the signal for synaptic strengthening (long-term potentiation, LTP) or weakening (long-term depression, LTD), resulting in them having two thresholds for inducing plasticity. In this study, we show opposite and complementary roles of metaplasticity in these two thresholds for inducing LTP and LTD on learning how to solve the linear and nonlinear feature binding problem (FBP and NFBP). In short, metaplasticity in one threshold (e.g. LTD) allows synaptic plasticity of the opposite type (e.g. LTP) to be properly expressed. This happens because metaplasticity in the LTD threshold protects strengthened synapses from weakening, thus allowing them to persistently increase during learning (and encode learned patterns). Similarly, metaplasticity in the LTP threhsold prevents weakened synapses from strengthening, thus allowing them to persistently decrease. Metaplasticity in both thresholds is necessary when synapses are clustered and the neuron needs to rely on supralinear dendritic integration for learning.
Yoshihara, S.; Fukiage, T.; Nishida, S.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWIt is suggested that experiences of perceiving blurry images in addition to sharp images contribute to the development of robust human visual processing. To computationally investigate the effect of exposure to blurry images, we trained Convolutional Neural Networks (CNNs) on ImageNet object recognition with a variety of combinations of sharp and blurry images. In agreement with related studies, mixed training on sharp and blurred images (B+S) makes the CNNs close to humans with respect to robust object recognition against a change in image blur. B+S training also reduces the texture bias of CNN in recognition of shape-texture-cue-conflict images, but the effect is not strong enough to achieve a strong shape bias comparable to what humans show. Other tests also suggest that B+S training is not sufficient to produce robust human-like object recognition based on global con-figurational features. We also show using representational similarity analysis and zero-shot transfer learning that B+S-Net does not acquire blur-robust object recognition through separate specialized sub-networks, each for sharp and blurry images, but through a single network analyzing common image features. However, blur training alone does not automatically create a mechanism like the human brain where subband information is integrated into a common representation. Our analyses suggest that experience with blurred images helps the human brain develop neural networks that robustly recognize the surrounding world, but it is not powerful enough to fill a large gap between humans and CNNs.
Truong, N.; Noei, S.; Karami, A.
Show abstract
Convolutional neural networks (CNNs) have become essential models for predicting neural activity and behavior in visual tasks. However, their ability to capture higher-level cognitive functions, such as numerosity discrimination, remains debated. Numerosity, the ability to perceive and estimate the number of items in a visual scene, is often proposed to rely on specialized number-detector units within CNNs, analogous to number-selective neurons observed in the brain. In this study, we use CORnet, a biologically inspired CNN architecture inspired by the organization of the primate visual system. To address a limitation of classical Representational Similarity Analysis (RSA)--its assumption that all units contribute equally--we apply pruning, a feature selection approach that identifies the units most relevant for explaining behavioral similarity structure. Our results show that number-detector units are not critical for population-level representations of numerosity, challenging their proposed role in previous studies.
Pham, T. Q.; Yoshimoto, T.; Niwa, H.; Takahashi, H. K.; Uchiyama, R.; Matsui, T.; Anderson, A. K.; Sadato, N.; Chikazoe, J.
Show abstract
Humans and now computers can derive subjective valuations from sensory events although such transformation process is essentially unknown. In this study, we elucidated unknown neural mechanisms by comparing convolutional neural networks (CNNs) to their corresponding representations in humans. Specifically, we optimized CNNs to predict aesthetic valuations of paintings and examined the relationship between the CNN representations and brain activity via multivoxel pattern analysis. Primary visual cortex and higher association cortex activities were similar to computations in shallow CNN layers and deeper layers, respectively. The vision-to-value transformation is hence proved to be a hierarchical process which is consistent with the principal gradient that connects unimodal to transmodal brain regions (i.e. default mode network). The activity of the frontal and parietal cortices was approximated by goal-driven CNN. Consequently, representations of the hidden layers of CNNs can be understood and visualized by their correspondence with brain activity-facilitating parallels between artificial intelligence and neuroscience.
Gundavarapu, A.; Chakravarthy, S.
Show abstract
A breakthrough in the understanding of dynamic 3D shape recognition was the discovery that our visual system can extract 3D shape from inputs having only sparse motion cues such as (i) point light displays and (ii) random dot displays representing rotating 3D shapes - phenomena named as biological motion (BM) processing and structure from motion (SFM) respectively. Previous psychological and computational modeling studies viewed these two as separate phenomena and could not fully identify the shared visual processing mechanisms underlying the two phenomena. Using a series of simulation studies, we describe the operations of a dynamic deep network model to explain the mechanisms underlying both SFM and BM processing. In simulation-1, the proposed Structure from Motion Network (SFMNW) is trained using displays of 5 rotating surfaces (cylinder, cone, ellipsoid, sphere and helix) and tested on its shape recognition performance under a variety of conditions: (i) varying dot density, (ii) eliminating local feature stability by introducing a finite dot lifetime, (iii) orienting shapes, (iv) occluding boundaries and intrinsic surfaces (v) embedding shape in static and dynamic noise backgrounds. Our results indicate that smaller dot density of rotating shape, oriented shapes, occluding boundaries, and dynamic noise backgrounds reduced the models performance whereas eliminating local feature stability, occluding intrinsic boundaries, and static noise backgrounds had little effect on shape recognition, suggesting that the motion of high curvature regions like shape boundaries provide strong cues in shape recognition. In simulation-2, the proposed Biological Motion Network (BMNW) is trained using 6 point-light actions (crawl, cycle, walk, jump, wave, and salute) and tested its action recognition performance on various conditions: (i) inverted (ii) scrambled (iii) tilted (iv) masked (v) actions, embedded in static and dynamic noise backgrounds. Model performance dropped significantly for the presentation of inverted and tilted actions. On the other hand, better accuracy was attained in distinguishing scrambled, masked actions, performed under static and dynamic noise backgrounds, suggesting that critical joint movements and their movement pattern generated in the course of action (actor configuration) play a key role in action recognition performance. We also presented the above two models with mixed stimuli (a point light actions embedded in rotating shapes), and achieved significantly high accuracies. Based on the above results we hypothesize that visual motion circuitry supporting robust SFM processing is also involved in the BM processing. The proposed models provide new insights into the relationships between the two visual motion phenomena viz., SFM and BM processing.
Tamura, H.
Show abstract
Neurons in the cerebral cortex are organized topographically. In the primate visual cortex, neighboring neurons often respond to similar stimulus parameters, such as receptive field position, orientation, color, and spatial frequency. Preferred stimulus parameters change smoothly across the cortical surface. If such topographic organization plays an important role in computation, it is likely to emerge in artificial neural networks. In this study, a multistream convolutional neural network was constructed in which filters in the first convolutional layer were arranged in a two-dimensional filter matrix according to their output connections. The network was trained using supervised learning for image classification. Although adjacent filters in the filter matrix can develop any structure in principle, they acquire similar degrees of orientation and color selectivity. Moreover, they prefer similar orientations, hues, and spatial frequency. The similarity decreases with distance between filters in the matrix. Furthermore, neural-network model instances that have a strong relationship between filter distance and filter-property similarity performed better than those with a weak relationship. These results suggest that topographic organization emerges spontaneously in an artificial neural network and plays an important role in model performance, suggesting the importance of topographic organization for computations performed by artificial and biological neural networks.
Kraus, M. K.; Verkerk, L.; Keemink, S. W.
Show abstract
Perceptual illusions are widely used to study brain processing, and are essential for elucidating underlying function. Successful brain models should then also be able to reproduce these illusions. Some of the most successful models for vision are several variants of Deep Neural Networks (DNNs). These models can classify images with human-level accuracy, and many behavioral and activation measurements correlate well with humans and animals. For several networks it was also shown that they can reproduce some human illusions. However, this was typically done for a limited number of networks. In addition, it remains unclear whether the presence of illusions is linked to either how accurate or brain-like the DNNs are. Here, we consider the scintillating grid illusion, to which two DNNs have been shown to respond as if they are impacted by the illusion. We develop a measure for measuring Illusion Strength based on model activation correlations, which takes into account the difference in Illusion Strength between illusion and control images. We then compare the Illusion Strength to both model performance (top-1 ImageNet), and how well the model explains brain activity (Brain-score). We show that the illusion was measurable in a wide variety of networks (41 out of 51). However, we do not find a strong correlation between Illusion Strength and Brain-Score, nor performance. Some models have strong illusion scores but not Brain-Score, or vice-versa, but no model does both well. Finally, this differs strongly between model types, particularly between convolutional and transformer-based architectures, with transformers having low illusion scores. Overall, our work shows that Illusion Strength measures an important metric to consider for assessing brain models, and that some models could still be missing out on some processing important for brain functioning.
Storrs, K. R.; Khaligh-Razavi, S.-M.; Kriegeskorte, N.
Show abstract
An error was made in including noise ceilings for human data in Khaligh-Razavi and Kriegeskorte (2014). For comparability with the macaque data, human data were averaged across participants before analysis. Therefore the noise ceilings indicating variability across human participants do not accurately depict the upper bounds of possible model performance and should not have been shown. Creating noise ceilings appropriate for the fitted models is not trivial. Below we present a method for doing this, and the results obtained with this new method. The corrected results differ from the original results in that the best-performing model (weighted combination of AlexNet layers and category readouts) does not reach the lower bound of the noise ceiling. However, the best-performing model is not significantly below the lower bound of the noise ceiling. The claim that the model "fully explains" the human IT data appears overstated. All other claims of the paper are unaffected.
Ward, E. J.
Show abstract
Perceptual illusions--discrepancies between what exists externally and what we actually see--reveal a great deal about how the perceptual system functions. Rather than failures of perception, illusions expose automatic computations and biases in visual processing that help make better decisions from visual information to achieve our perceptual goals. Recognizing objects is one such perceptual goal that is shared between humans and certain Deep Convolutional Neural Networks, which can reach human-level performance. Do neural networks trained exclusively for object recognition \"perceive\" visual illusions, simply as a result of solving this one perceptual problem? Here, I showed four classic illusions to humans and a pre-trained neural network to see if the network exhibits similar perceptual biases. I found that deep neural networks trained exclusively for object recognition exhibit the Muller-Lyer illusion, but not other illusions. This result shows that some perceptual computations that are similar to humans may come \"for free\" in a system with perceptual goals similar to humans.
Chu, T.; Wu, Y.; Qiu, W.; Jiang, Z.; Burgess, N.; HONG, B.; WU, S.
Show abstract
Localized space coding and phase coding are two distinct strategies responsible, respectively, for representing abstract structure and sensory observations in neural cognitive maps. In spatial representation, localized space coding is implemented by place cells in the hippocampus (HPC), while phase coding is implemented by grid cells in the medial entorhinal cortex (MEC). Both strategies have their own advantages and disadvantages, and neither of them meets the requirement of representing space robustly and efficiently in the brain. Here, we show that through reciprocal connections between HPC and MEC, place and grid cells can complement each other to overcome their respective shortcomings. Specifically, we build a coupled network model, in which a continuous attractor neural network (CANN) with position coordinate models place cells, while multiple CANNs with phase coordinates model grid cell modules with varying spacings. The reciprocal connections between place and grid cells encode the correlation prior between the sensory cues processed by HPC and MEC, respectively. Using this model, we show that: 1) place and grid cells interact to integrate sensory cues in a Bayesian manner; 2) place cells complement grid cells in coding accuracy by eliminating non-local errors of the latter; 3) grid cells complement place cells in coding efficiency by enlarging the number of environmental maps stored stably by the latter. We demonstrate that the coupled network model explains the seemingly contradictory experimental findings about the remapping phenomena of place cells when grid cells are either inactivated or depolarized. This study gives us insight into understanding how the brain employs collaborative localized and phase coding to realize both robust and efficient information representation.
Raman, R.; Hosoya, H.
Show abstract
Recent computational studies have emphasized layer-wise quantitative similarity between convolutional neural networks (CNNs) and the primate visual ventral stream. However, whether such similarity holds for the face-selective areas, a subsystem of the higher visual cortex, is not clear. Here, we extensively investigate whether CNNs exhibit tuning properties as previously observed in different macaque face areas. While simulating four past experiments on a variety of CNN models, we sought for the model layer that quantitatively matches the multiple tuning properties of each face area. Our results show that higher model layers explain reasonably well the properties of anterior areas, while no layer simultaneously explains the properties of middle areas, consistently across the model variation. Thus, some similarity may exist between CNNs and the primate face-processing system in the near-goal representation, but much less clearly in the intermediate stages, thus giving motivation for a more comprehensive model for understanding the entire system.
Hendrikx, E.; Manns, D.; van der Stoep, N.; Testolin, A.; Zorzi, M.; Harvey, B. M.
Show abstract
The brain exhibits a gradual transition in responses to visual event duration and frequency through the visual processing hierarchy: from monotonically increasing to timing-tuned responses. Over their hierarchies, properties of both response types are progressively transformed. Here, we implement simulations based on artificial neural networks to investigate the requirements of neural systems for the emergence of such responses and their properties transformations. We see that recurrent networks develop monotonic responses whose properties progressions over network layers resemble those over brain areas. Responses to another sensory quantity, Furthermore, recurrent networks can further develop tuned responses, but only with training, a gradual transition between monotonic and tuned responses emerges. Particularly, if this training is done on predictable sequences, the tuned properties progressions resemble those observed in the brain. These results suggest that the emergence of visual timing-tuned responses and the subsequent hierarchical transformations of these responses result from recurrent neural computation and predictive processing of sensory event timing.
Shen, X.; Li, F.; Min, B.
Show abstract
The ability to accumulate evidence over time for deliberate decision is essential for both humans and animals. Decades of decision-making research have documented various types of integration kernels that characterize how evidence is temporally weighted. While numerous normative models have been proposed to explain these kernels, there remains a gap in circuit models that account for the complexity and heterogeneity of single neuron activities. In this study, we sought to address this gap by using low-rank neural network modeling in the context of a perceptual decision-making task. Firstly, we demonstrated that even a simple rank-one neural network model yields diverse types of integration kernels observed in human data--including primacy, recency, and non-monotonic kernels--with a performance comparable to state-of-the-art normative models such as the drift diffusion model and the divisive normalization model. Moreover, going beyond the previous normative models, this model enabled us to gain insights at two levels. At the collective level, we derived a novel explicit mechanistic expression that explains how these kernels emerge from a neural circuit. At the single neuron level, this model exhibited heterogenous single neuron response kernels, resembling the diversity observed in neurophysiological recordings. In sum, we present a simple rank-one neural circuit that reproduces diverse types of integration kernels at the collective level while simultaneously capturing complexity of single neuron responses observed experimentally. Author SummaryThis study introduces a simple rank-one neural network model that replicates diverse integration kernels--such as primacy and recency--observed in human decision-making tasks. The model performs comparably to normative models like the drift diffusion model but offers novel insights by linking neural circuit dynamics to these kernels. Additionally, it captures the heterogeneity of single neuron responses, resembling diversity observed in experimental data. This work bridges the gap between decision-making models and the complexity of neural activity, offering a new perspective on how evidence is integrated in the brain.
Mollard, S.; Bohte, S.; Roelfsema, P.
Show abstract
Natural scenes usually contain many objects that need to be segregated from each other and the background. Object-based attention is the process that groups image fragments belonging to the same objects. Curve-tracing tasks provide a special case, testing our ability to group image elements of an elongated curve. In the brain, curve-tracing is associated with the gradual spread of enhanced neuronal activity over the representation of the traced curve. Previous studies demonstrated that the tracing speed is higher if curves are far apart than if they are nearby. One hypothesis is that a larger distance between curves permits activity propagation in higher visual cortex areas. In these higher areas receptive fields are larger and connections exist between neurons representing image regions that are farther apart (Pooresmaeili et al., 2014). We propose a recurrent architecture for the scale-invariant tracing of curves and objects. The architecture is composed of a feedforward pathway that dynamically selects the appropriate scale for tracing, and a recurrent pathway for propagating enhanced neuronal activity through horizontal and feedback connections, enabled by a disinhibitory loop involving VIP and SOM interneurons. We trained the network using a biologically plausible reinforcement learning scheme and observed that training on short curves allowed the networks to generalize to longer curves and 2D-objects. The network chose the scale based on the distance between curves and the width of objects, just as in human psychophysics and the visual cortex of monkeys. The results provide a mechanistic account of the learning and execution of multiscale perceptual grouping in the brain.
Gao, Y.
Show abstract
An important open question in computational neuroscience is how various spatially tuned neurons, such as place cells, are used to support the learning of reward-seeking behavior of an animal. Existing computational models either lack biological plausibility or fall short of behavioral flexibility when environments change. In this paper, we propose a computational theory that achieves behavioral flexibility with better biological plausibility. We first train a mixture of Gaussian distributions to model the ensemble of firing fields of place cells. Then we propose a Hebbian-like rule to learn the synaptic strength matrix among place cells. This matrix is interpreted as the transition rate matrix of a continuous time Markov chain to generate the sequential replay of place cells. During replay, the synaptic strengths from place cells to medium spiny neurons (MSN) are learned by a temporal-difference like rule to store place-reward associations. After replay, the activation of MSN will ramp up when an animal approaches the rewarding place, so the animal can move along the direction where the MSN activation is increasing to find the rewarding place. We implement our theory into a high-fidelity virtual rat in the MuJoCo physics simulator. In a complex maze, the rat shows significantly better learning efficiency and behavioral flexibility than a rat that implements a neuroscience-inspired reinforcement learning algorithm, deep Q-network.
Dujmovic, M.; Bowers, J.; Adolfi, F.; Malhotra, G.
Show abstract
Representational Similarity Analysis (RSA) is an innovative approach used to compare neural representations across individuals, species and computational models. Despite its popularity within neuroscience, psychology and artificial intelligence, this approach has led to difficult-to-reconcile and contradictory findings, particularly when comparing primate visual representations with deep neural networks (DNNs). Here, we demonstrate how such contradictory findings could arise due to incorrect inferences about mechanism when comparing complex systems processing high-dimensional stimuli. In a series of studies comparing computational models, primate cortex and human cortex we find two problematic phenomena: a "mimic effect", where confounds in stimuli can lead to high RSA-scores between provably dissimilar systems, and a "modulation effect", where RSA- scores become dependent on stimuli used for testing. Since our results bear on a number of influential findings, we provide recommendations to avoid these pitfalls and sketch a way forward to a more solid science of representation in cognitive systems.